Investigating the Microbiota between Enfants, Children, and Mothers


Statistical Tests and Analysis Methods Used

  • NMDS
  • Random Forest
  • Wilcoxson Rank Sum
  • Benjamini-Hochberg Multiple Comparison Correction

Reasons for Choosing these Methods

I wanted to improve upon the the figure found in Figure 4 of the Koren, et al. 2012 study. Although it is possible to infer that the microbiota is getting more similar between children and their mothers, this is not easily interpretable from the figure that is presented. It also required matched child and mother sample pairs. I wanted to specifically ask the question whether the phenomenon described in the Koren, et al. 2012 study was generalizable across groups (i.e. did not need matching pairs). Secondly, I wanted to know if I could identify what was important for the model classifications being made as to whether the microbiota was an adult mother or not.

In order to do this I choose to use the Random Forest (RF) machine learning algorithm. The benefits of the model are that it takes into account the inter-dependency of the otus, it does not need data to be normal, and it can deal with 0 inflated data without too much problems. This makes it superior to general linear models and feature selection algorithms, such as LEfSe.

The prediction function that I used in R (v3.4.1) returns a probability of whether or not the sample is an adult mother. This data is unlikely to be normally distributed so to be conservative when testing for significance I used the Wilcoxson Rank Sum Test. To correct for multiple comparisons I used the Benjamini-Hochberg method as opposed to the Bonferroni correction since the latter can be overly stringent and result in more false negatives. All P-values that are reported herein are those that have been corrected for multiple comparisons.

Results of the analysis

Figure 1: NMDS of All Age Groups Analyzed

Although there is clear separation between the infants and the adults. This difference does not seem to hold between the 4 year old age group and the adults. Even if there was a difference it would be very hard to spot this using this specific method. So the question is whether a different approach may provide more information then what we observe on NMDS.

Figure 2: Transition of Bacterial Community from Child to Adult

Overall there was a significant difference between all groups and the six-month old infants, with respect to the model probability of being an adult mother (P-value < 0.05). Interestingly, using RF gives more granularity then what was originally used in the manuscript. Although the 4 year old age group do have an increased probability of being an adult their probability is significantly lower then both the adult mothers during the third trimester (P-value = 1.38e-05) and adult mothers 1 month after pregnancy (P-value = 0.0016).

Figure 3: Top 30 OTUs within the Model

Although the lowest taxonomic ID for some of the OTUs was only able to identify them as bacteria, there are still a few interesting things going on based on the other taxonomic identifications. The top OTU identified as most important to the model classifies to a bacterium that is typically found in coral and oceans. Alternatively, I set the taxonomic ID threshold to a very liberal 60% and it is possible that this is not really the true ID of this OTU. This makes me suspicious that contamination was not an issue during the completion of this study. In contrast, there do appear to biologically relevant gut OTUs that are central to this model. Both Bacteroides and Faecalibacterium are known to be resident gut microbes and are important to both health and disease states.